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Abstract 



At ITW'IO, Bringer et al. suggest to strengthen their previous identi- 
fication protocol by extending the Code Reverse Engineering (ORE) prob- 
lem to identification codes. We first extend security results by Tillich et 
al. on this very problem. We then prove the security of this protocol 
using information theoretical arguments. 
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1 Introduction 

At Indocrypt'09, Bringer et al. [3. introduce a new identification protocol based 
on the use of identification codes [T]. Their proposal, denoted here the BCCK 
identification protocol, relies on a construction of identification codes by Moulin 
and Koetter [T^] using Reed-Solomon codes. In a few words, the BCCK iden- 
tification protocol can be described as followed (cf. Figure [U in [3] a low-cost 
contactless device (CLD) and its reader want to mutualy authenticate them- 
selves). 

The CLD stores two secret polynomials P, P' of degree less than k known 
only by the Reader; to authenticate itself to the CLD, the reader proves the 
knowledge of P by sending {i,P{ai)) where ai is the i-th element of F^. The 
CLD proves its identity by replying with {P'(ai)). 
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Figure 1: BCCK Identification Protocol [3] 
[3] proves the security and the privacy properties of their protocol relying on 



a classical cryptographic assumption known as the Polynomial Reconstruction 
(PR) problem [ZHlO]. 

We here want to switch from this computational perspective to an informa- 
tion theoretical one. 

On one hand, a first attempt was made in this direction by [llj where some 
recommendations are made to reach this more stringent goal. On the other 
hand, [2] introduces an extension of the BCCK identification protocol, simply 
considering that the underlying Reed-Solomon codes stay unknown from the 
adversary who has thus to solve the Code Reverse Engineering (CRE) prob- 
lem [3H51fT5] to recover the initial parameters of the BCCK identification proto- 
col. This restriction on what is available to the adversary does not modify the 
underlying structure of the BCCK identification protocol. 

In this paper, we show that, in fact, an adversary cannot solve this CRE for 
identification codes. Our results are based on those of 0. We consider different 
cases taking into account the noise over the channel and the capacity of the 
adversary to isolate or not the communications of a CLD. 

2 Related Works 

2.1 Code Reverse Engineering Problem 

The Code Reverse Engineering (CRE) problem [JHEKS] corresponds to the sit- 
uation where an observer tries to retrieve information from an eavesdropped 
communication without any specific prior knowledge on the encoding represen- 
tation of the transmitted data. 

In the CRE problem, it is assumed that the adversary knows the length n of 
the encoded messages and a subset of codes of length n. Then by eavesdropping 
several messages over a noisy channel, he tries to determine from which code 
they are generated. 

Definition 1 (Code Reverse Engineering problem fB]) Let C be a family 
of codes of given length n and given rate R. 

• Let C he a code chosen randomly in C and x^, . . . , x*^ be M random code- 
words to be transmitted over the communication channel. 

• Given the received words y^,. . . , y^^ , the problem is to guess which C has 
been used. 

The difficulty depends on the number of received words, the level of noise and 
the rate of the codes. In [6], the authors analyze the number of eavesdropped 
messages that is needed to achieve a correct guess with good probability. 

Let V and W be two random variables. Let H{V), respectively 7f(F|VF) and 
I{V; W), denote the binary entropy of V, respectively the conditional entropy 
of V given W, and the mutual information of V and W. When assuming that 
the codewords are chosen independently and that the channel is memoryless, 
we have the following result. 



2 



Lemma 1 ( [6]) The conditional entropy H{C\y) of C given y = {y^, . . . , y^) 
is lower bounded by 

\og^{4C)-M{I{x-y)~I{{x-y)\C)) 
with X one of the x^ and y the corresponding y^ . 

This implies that the closer the mutual information I{{x\C)] {y\C)) and /(x; y) 
are, the harder to guess C is. 

Thanks to Fano's inequality, [5] estimates also how large M should be to 
achieve a good guess of C with negligible error probability when n and #C 
go to infinity; this needs M ~ (i(x-y)^i{{x-'y)\c)) • "^^^ result is then exploited 
for linear codes with rate below the channel capacity (i.e. with overwhelming 
probability to decode correctly a received word when n large) and for regular 
LDPC codes. 

2.2 Identification Codes 

Let X,y be two alphabets, r/ the message length, and W a channel from 
to y^>, defined as a conditional probability law: W'^{y\x) is the probability to 
receive a message y e 3^'' given a transmitted message x S X^. By extension, 
for a given subset E C iV, W^{E\x) is the probability to receive a message 
belonging to E when x has been transmitted. 

Definition 2 (Identification Code, [Ij) A {rj, N, Xi, X2) -identification code 
from X to y is given by a family {{Q{-\i), Di)}^ with i G {1, . . . , N} where: 

• Q{'\i) a probability distribution over X"^ , that encodes i (the encoding 
set of i is defined as the set of messages x for which Q{x\i) > 0, in other 
words, the set of messages likely to encode i ), 

• Di G y is the decoding set, 

• Ai and A2 are the first-kind and second-kind error rates, with 

Xi> Q{x\^)W''(D^\x) 

and 

A2 > ^ Q{x\j)W'^iD,\x) 

(where W^{Di\x) is the probability to be in the decoding set Di given a 
transmitted message x and W^{Di\x) the probability to be outside the de- 
coding set) 

for all i,j G {1, . . . ,N} such that i ^ j. 
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To simplify the discussion in next sections, we restrict ourselves to the case 
where error rates inequalities above are in fact equalities (i.e. the error rates 
are the same for all choice of i and j). 

Moulin and Koetter introduce in [T^] the following identification code based 
on Reed-Solomon codes. 

A Reed-Solomon code over a finite field Fg, of length n < q—1, and dimension 
k, is the set of the evaluations of all polynomials P G of degree less 

than fc — 1, over a subset F C ¥q of size n {F — {ai, . . . , a„}). In other 
words, for each fc-tuple {xq, . . . ,Xk~i) G F^, the corresponding Reed-Solomon 
word is the n-tuple (yi,...,?/„) where yi = X)j=o^j'^i- ^^e sequel, we 
identify a source word {xq, . . . , Xk-i) G Fj with the corresponding polynomial 



Definition 3 (Moulin-Koetter RS-Identification Codes) Let ¥q be a fi- 
nite field of size q, k < n < q ~ 1 and an evaluation domain F — {ai, . . . ,a„} G 



Consider the collection App = {{i, P{ai)) \ i G {1, ■ • . ,n}} for P any poly- 
nomial on ¥q of degree at most k — 1. 

Then the Moulin-Koetter RS-Identification Codes are defined by: 

• their encoding distribution Q{-\i) which is taken as the uniform distribution 
over Ap p, 

• their family of encoding and decoding sets {(^i?_p, ^F,p)}peF<j[x],dogP<fe- 

From the definition and the fact that the Reed-Solomon codes are Maximum 
Distance Separable, this leads to (rj = logj n + logj q,N = g*^, Ai = 0, A2 ■^^) 
identification codes from {0, 1} to {0, 1}. 

Example 1 Throughout this paper, we take the parameters suggested in f3I for 
the identification protocol of Figure[Ji q — 2^^, n — 2^^ , k — 2^ . 

In the original paper, there is a limitation on the number of times a CLD can 
be identified by a reader. JSjl preconises that a same CLD can only be interrogated 
at most 2048 times. 

3 CRE for identification codes 

In this section, we show how to extend the Code Reverse Engineering problem 
and the bounds from Section [^?T] to the case of identification codes. This is the 
first important contribution of this paper. 

Definition[l]in the context of transmission codes gives the following definition 
for identification codes. 

Definition 4 (Identification CRE problem) LetC be a family of identifica- 
tion codes (cf. Definition\^ of given parameters (77, A'^, Ai, A2) from the alphabet 
X to alphabet y . 



.k-l 
'3=0 ^3 




4 



• Let C = {{Q{-\i), Di)}i^^i TV} code chosen randomly in C and i = 
(i^, . . . , i^^) be M random messages to be encoded over the channel. 

• Given the received messages x = (x^, . . . , x*^), the problem is to guess 
which C has been used. 

Remark 1 Note that we modify the original problem, replacing the encoded 
messages by the messages to be encoded, to be able to address the case without 
errors in the BCCK identification protocol. 

With a memoryless channel, Lemma [T] is adapted accordingly as follows. 

Lemma 2 For independent choices of i — {i^, . . . ,i^'^), the conditional en- 
tropy H(C\x) of the identification code C given the received messages x = 
{x^ , . . . ,x^) is lower bounded by 

log2(#C) - x) + I{{x- C)\i) - x)\C)) (1) 

i.e. 

log2(#C) - M{H{x) - H{i) + H{t\C, x) - H{x\C, i)) (2) 

Proof. As for the original proof of Lemma [U this is based on the relation 
I{i;x;C) ^ I{i;x) + I{{x-C)\i) ^ I{x-C) + I{{i;x)\C) where here I{{x;C)\i) 
is not equal to 0. As H{i) = H{i\C), this leads to log2(#C) - M{I{{x;C)\i) - 
I{{i-, C)\x)). I{{x; C)\i) - I{{i; C)\x) can also be simplified into H{x) - H(i) + 
H{i\C, x) - h\x\C, i) as H{x\i) - H{i\x) = H{x) - H{i). □ 

One important difference with the CRE problem for transmission codes is 
that the solution is not trivial for a noiseless channel. This is due to the first- 
kind Ai and second-kind A2 error rates of the identification codes: if at least 
one error-rate is non zero, then /(i; x) < H{i) whereas the mutual information 
between a message and the received message would have been maximal with a 
transmission code. Intuitively, what makes the problem harder for identification 
codes is that the quantity of transmitted information can be very low. 

Assume that the distribution is regular in the error rate inequalities of Def- 
inition [5] (the same probability W holds for all x): 

• The first-kind error rate Ai implies that for a given i, we have a probability 
Ai to take a x outside the decoding set of i. This means that H{i\x,C) 
would be almost equal to II{i). Consequently, with probability Ai we have 

I{i;x)^I{{i;x)\C). 

• The second-kind error rate means that for another given message j we 
have a probability A2 to also have x g Dj. This implies that H{i\x,C) 
may remain high. 

Corollary 1 Let X = y = {0, 1}. For a constant size of the encoding 

#{x\Q{x\i) >0} =t(A^)2'' 
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in the context of a noiseless channel. The equation Eq. (0j simplifies itself into 



log2(#C)-Af(log2l/T(7V)) (3) 

Proof. The equation Eq. ^ becomes 

log2(#C) - MiHix) - log2 N + H{i\C, x) - \og^ r(7V)2'') 

and we have H[x) < t], H{i) = logj N, H{x\C, i) = loga t[N)2'^ , and logs N > 
H{i\x,C). □ 
We see that the greater t{N) will be, the greater the expression ^ will be. 
By using Fano's inequality as in [6], we obtain that we need M ~ iog^i/r(jv) 
for guessing C with negligible error probability when n and #C go to infinity. 
M will go to infinity quickly as soon as logs is negligible compared to 

l0g2(#C). 

In case of additional noise on the communication channel, the difficulty will 
increase with the level of noise (as for the classical CRE problem). 



4 Application to BCCK protocol 

Now comes the main contribution of our work: we use the CRE problem for 
identification codes to study the security of the BCCK identification protocol 
from an information theory perspective. 

[TT] suggests to increase the security of the identification protocol [3] , which 
relies on the Polynomial Reconstruction problem, by additionally exploiting the 
Code Reverse Engineering (CRE) problem. The goal is to restrict further the 
information available to an eavesdropper or an active adversary. 

Let ¥q be a finite field of size q, k < n < q — 1, we define C the set of the 
Moulin-Koetter (77 = logj n + logj q, — q'',Xi = 0, A2 = identification 
codes. Following Definition [31 an identification code C G C is defined according 
some evaluation domain Fq ~ {q^c,1i • • • j Q^c.n} G with 

• a family of encoding and decoding sets {{Ap^^^p, Ap^ ^p)} p^^^^x]. dog p<k, 

• where Ac,p = {{j, P{ac\j})\ j e {1,.. .,n}}. 

This doing, a random code C is determined by the random choice of n elements 
in ¥g. The size of C is (^)n!. 

Lemma [5] and the same analysis that for Eq. Q with t{N) ^ 1/q lead to 
the following result. 

Corollary 2 H{C\x^, . . . > log^ (^)n! - Mlog^q 

This underlines the difficulty of the CRE problem in this setting when n 
grows to infinity for M polynomial in log2 via Stirling's formula, log2 {f^nl 
is approximately q\og2{q/{q — n)) + n\og2{q — n) ~ Q,{n). 
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Example 2 Taking back the values from Example [II 

this leads to {75,2^^^'^^^,0,l/8)-identificatioTi codes. This gives: 

/264\ 

H{C\x\ . . . , x'') > log2 ( 11 ) 2"! - 64 X M. 

As logo a ii)2"'^^! is approximately equal to 2^^, taking M — 2048 makes the 
lower hound useless. 

Consequently, note that we need a more stringent result than the one we just 
obtained. This motivates the introduction of a new tighter lower bound. 

With the foUowing resuh, which is specific to the Mouhn-Koetter construc- 
tion, we prove that an adversary gains no information in this situation. 

Proposition 1 For the C family of Moulin- Koetter identification codes defined 
as above over ¥q with k < n < q — 1, for independent choices of M messages 
P^, . . . , to be encoded for a random choice of C G C, we have 

H{C\x\...,x^'')^log^ Qn! 

x^, . . . , a;*^ are the received messages, independently and randomly chosen in the 
encoding sets of P^ , . . . , P^^ , and eavesdropped by the adversary (here without 
noise). 

Proof We have H{x) = logj n+loga q, H{P) = log^ N = k log^ q, H{x\C, P) = 
log2 n and we know that the uncertainty on P knowing x and C corresponds to 
the choice of a polynomial of degree at most fc — 2, i.e. H{P\x,C) = {k — \)\og2q. 
□ 

4.1 Validity of the independence assumption 

In the protocol of the messages to be encoded and the encoding messages 
are not fully independent due: 

1. to the relative small size of the encoding sets and the correlation between 
them (one can detect if the same index j is used to encode independent 
polynomials) , 

2. to the potential ability of an adversary to detect whether two encoding 
messages are sent to the same CLD (i.e. that they are related to the same 
polynomial/message to be encoded). 

We study now the impact on the previous estimation. 

A more general version of Lemma [2] is given below for the situation where 
independent choices are not required. 
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Lemma 3 Let C E C be an identification code. Let i = (i^ , . . . ,1^^) be the 
variable associated to M messages to be encoded and x the variable for the 
corresponding received messages. We have 

H{C\x) > log2(#C) (4) 
~{H{x)~H{i)+H{i\C,x)-H{x\C~i)) 

As H{i) is always greater than H{i\C,x), we obtain 

H{C\x) > log2(#C) - {H{x) - Hix\C,i)) 

Moreover, H{x) - H{x\C~i) < H{x) < M x H{x), thus 

H{C\x) >log2ii^C) - M X H{x) (5) 

With the parameters of the identification protocol, we deduce 

H{C\x) > log2 n\ - M(log2 n + logs ?) 

This means that the knowledge of the adversary, on the code C E C that is used, 
is still negligible when n grows to infinity for M polynomial in logs 

Example 3 For the parameters from Example\^ q = 2^^^, n = 2^^ , k = 2^, the 
lower bound is 

logs (^2iij2"!-75xM 
which remains high only for M < 1747. 

We now study a tighter estimation directly from the expression To 
correspond to the situation where the adversary is able to determine whether 
the same CLD is aimed (for instance by capturing it), assume that in z = 
{i^, . . . , i*^), each message i — P is repeated I times exactly. We assume that 
each time a different encoded message x = {j, P{aj)) is used. We obtain for I < 
k, the same result as before (cf. Proposition [T]) , whereas the entropy decreases 
for I > k. 

Proposition 2 For I < k, H{C\x) = logs Q)"! 
For I > k, H{C\x) > logs " f log2 9 

Proof. We apply Equation Q with the following values: 

• H{x) = f E -=0(1082 (« - j) + log2 q) 

• H(i) = ^ logs ^ = log2 9 

• H{i\C, x) ^ ^ X {k — I) logs g if / < fc, otherwise. 
. H{x\Cj) = fj:'iJoi^og,{n~j)) 
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□ 

Following the intuition, with non independent messages, the adversary gains 
some knowledge on the chosen code C S C only when the order of repetition is 
strictly greater than k. 

Example 4 We set the same parameters as in Example\^ q — 2^**, n — 2^^, 
k = 2^ . The lower bound is 



For instance, this remains high until about M < 2^^ with I = 257 and M < 2^^ 
for I = 512. The number of needed M to have a low lower bound decreases up 
to approximately 2341 while increasing I to n. 

On the one hand, this can be interpreted as follows. A passive eavesdropping 
of 2048 BCCK identifications of the same OLD may enable an adversary to get 
almost all information on the underlying code when M — 2341. On the other 
hand, note that with only 2340 such eavesdroppings, the entropy stays very high 
for the adversary. 
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